Bioinformatics A Practical Guide to Next Generation Sequencing Data Analysis (Hamid D. Ismail)

194 ◾ Bioinformatics

Figure 5.19 shows that samples are clustered into tumor and normal samples based on the

profiles of the genes in these samples.

5.3.7.8 Model Fitting

Once we have computed dispersion estimates, we can use them to fit the negative binomial

generalized linear model, and then we can carry out the testing procedures for determin-

ing the differential expression. EdgeR has two functions to fit the RNA-Seq count data

to the GLMs: the “glmQLFit” function which fits the data to a quasi-likelihood negative

binomial generalized log-linear model and the “glmFit” function which fits the data to

a negative binomial generalized log-linear model. The difference between the two GLM

functions is that “glmQLFit” uses the trended negative binomial dispersion for fitting and

then estimates the quasi-likelihood dispersion from the deviance, while “glmFit” uses the

tagwise negative binomial dispersion for model fitting. You can use any one of them to

fit the count data. Run the following to fit the count data to the quasi-likelihood negative

binomial model:

fitq <- glmQLFit(yNorm, design)

names(fitq)

FIGURE 5.19 Heatmap clustering samples and top 10 variable genes.